Effects of Stop Words Elimination for AIR

نویسنده

  • Ibrahim Abu El-Khair
چکیده

The effectiveness of three stop words lists for Arabic Information Retrieval---General Stoplist, CorpusBased Stoplist, Combined Stoplist ---were investigated in this study. Three popular weighting schemes were examined: the inverse document frequency weight, probabilistic weighting, and statistical language modelling. The Idea is to combine the statistical approaches with linguistic approaches to reach an optimal performance, and compare their effect on retrieval. The LDC (Linguistic Data Consortium) Arabic Newswire data set was used with the Lemur Toolkit. The Best Match weighting scheme used in the Okapi retrieval system had the best overall performance of the three weighting algorithms used in the study, stoplists improved retrieval effectiveness especially when used with the BM25 weight. The overall performance of a general stoplist was better than the other two lists.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating the Parameters for Linking Unstandardized References with the Matrix Comparator

This paper discusses recent research on methods for estimating configuration parameters for the Matrix Comparator used for linking unstandardized or heterogeneously standardized references. The matrix comparator computes the aggregate similarity between the tokens (words) in a pair of references. The two most critical parameters for the matrix comparator for obtaining the best linking results a...

متن کامل

Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study

The effectiveness of three stop words lists for Arabic Information Retrieval---General Stoplist, CorpusBased Stoplist, Combined Stoplist ---were investigated in this study. Three popular weighting schemes were examined: the inverse document frequency weight, probabilistic weighting, and statistical language modelling. The Idea is to combine the statistical approaches with linguistic approaches ...

متن کامل

Query Term Selection Strategies for Web-based Chinese Factoid Question Answering

Passage retrieval plays an important role in a Chinese factoid Question Answering (QA) system. Query term selection is the process of choosing keywords from a given question to make the most use of information retrieval engines. Query terms selected by humans are analyzed to measure the difficulty and for evaluating machine generated results. Three approaches, namely stop words elimination, rul...

متن کامل

Photo catalytic removal of Toluene vapor from air in the Adsorption-Photo catalytic bed

Background and aims: Clean air is one of the most important components of health and sustainable development. Every person breathes about 10 kg of air per day and if it contains pollutants, it will have a serious impact on their health. Indoor air quality (IAQ) is one of the major health issues that have been addressed in recent years with changes in lifestyle patterns. Usually, due to the incr...

متن کامل

حذف مونوکسیدکربن به روش پلاسمای سرد

Abstract Background and aims:Nowadays, the non-thermal plasma is considered as a successful new technology with high efficiency in the air pollution control and is in the focal attention of the researchers. Various types of atmospheric pollutants adversely influence on the human health and the environment regionally and globally. Carbon monoxide has been introduced as a critical pollutant wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017